What Your Peers Struggle The Most?


Find out what topics and discussion materials is common among your peers.

Does "Operating Systems", "ubuntu", "memory", "page" sound familiar? Looks like you are not the only one.

Explore how we reach to that conclusion

It's you that telling us how to do it. We collect your responses, add some magic salt (NLP), and kaboom. Here is the detailed path:

  • We fetch the topics and discussion materials in a text format.
  • We use NLP for extracting the stem of each word.
  • We count how frequently the tags appear.
  • We implement back a spellchecker (Don't forget, they were just stems).
  • Combine the results, and visualize.

Stemming Words

In [146]:
from nltk.stem import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize
import pickle
import collections
from spellchecker import SpellChecker

spell = SpellChecker()
ps = PorterStemmer()
In [147]:
with open('topics.pk', 'rb') as handle:
    topics_list = pickle.load(handle)
In [149]:
counts = {}
doc.keys()

for doc in topics_list:
    print(doc["topic_name"])
    words = word_tokenize(doc["topic_name"])
    for word in words:

        word = ps.stem(word.lower())
        word = spell.correction(word) if not len(spell.candidates(word))>1 else word
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1
        
# print(doc["discussion_material"])
sorted_counts = sorted(counts.items(), key=lambda x: x[1], reverse=True)
Explain the main purpose of an operating system?
What is demand paging?
What are the advantages of a multiprocessor system?
What is kernel?
What are real-time systems?
What is a virtual memory?
Describe the objective of multiprogramming.
 What is time- sharing system?
 How are server systems classified?
 What is asymmetric clustering?
 What is a thread?
 Give some benefits of multithreaded programming.
 Briefly explain FCFS.
 What is RR scheduling algorithm?
 What are necessary conditions which can lead to a deadlock situation in a system?
 Enumerate the different RAID levels.
 Describe Banker’s algorithm
 What factors determine whether a detection-algorithm must be utilized in a deadlock avoidance system?
 State the main difference between logical from physical address space.
 How does dynamic loading aid in better memory space utilization?
 What are overlays?
 What is the basic function of paging?
 What is fragmentation?
 How does swapping result in better memory management?
 Give an example of a Process State.
 What is a socket?
 What is Direct Access Method?
 When does thrashing occur?
 What is the best page size when designing an operating system?
 When designing the file structure for an operating system, what attributes are considered?
 What is root partition?
 What are device drivers?
 What are the primary functions of VFS?
 What are the different types of CPU registers in a typical operating system design?
 What is the purpose of an I/O status information?
 What is multitasking?
 Explain pros and cons of a command line interface?
 What is caching?
 What is spooling?
 What is an Assembler?
 What are interrupts?
 What is GUI?
 What is preemptive multitasking?
 Why partitioning and formatting is a prerequisite to installing an operating system?
 What is plumbing/piping?
 What is NOS?
 Differentiate internal commands from external commands.
 Under DOS, what command will you type when you want to list down the files in a directory, and at the same time pause after every screen output?
 How would a file name EXAMPLEFILE.TXT appear when viewed under the DOS command console operating in Windows 98?
 What is a folder in Ubuntu?
 Explain why Ubuntu is safe and not affected by viruses?
 Explain what is Unity in Ubuntu? How can you add new entries to the launcher?
 Explain the purpose of using a libaio package in Ubuntu?
 What is the use of behavior tab in Ubuntu?
 What is the meaning of “export” command in Ubuntu?
 Explain how you can reset Unity Configuration?
 Explain how to access Terminal?
In [151]:
sorted_counts
Out[151]:
[('?', 50),
 ('what', 37),
 ('is', 28),
 ('the', 18),
 ('a', 16),
 ('of', 13),
 ('in', 12),
 ('system', 11),
 ('are', 10),
 ('explain', 8),
 ('an', 7),
 ('.', 7),
 ('how', 7),
 ('oper', 6),
 ('command', 6),
 ('ubuntu', 6),
 ('to', 5),
 ('when', 5),
 ('and', 4),
 ('you', 4),
 ('purpose', 3),
 ('page', 3),
 ('memori', 3),
 ('can', 3),
 ('differ', 3),
 ('doe', 3),
 ('design', 3),
 ('file', 3),
 (',', 3),
 ('main', 2),
 ('describe', 2),
 ('give', 2),
 ('algorithm', 2),
 ('deadlock', 2),
 ('util', 2),
 ('state', 2),
 ('from', 2),
 ('space', 2),
 ('better', 2),
 ('function', 2),
 ('access', 2),
 ('partit', 2),
 ('type', 2),
 ('multitask', 2),
 ('whi', 2),
 ('under', 2),
 ('do', 2),
 ('uniti', 2),
 ('use', 2),
 ('demand', 1),
 ('advantage', 1),
 ('multiprocessor', 1),
 ('kernel', 1),
 ('real-time', 1),
 ('virtual', 1),
 ('object', 1),
 ('multiprogram', 1),
 ('time-', 1),
 ('share', 1),
 ('server', 1),
 ('classify', 1),
 ('asymmetry', 1),
 ('cluster', 1),
 ('thread', 1),
 ('some', 1),
 ('benefit', 1),
 ('multithread', 1),
 ('program', 1),
 ('briefly', 1),
 ('fcf', 1),
 ('rr', 1),
 ('schedule', 1),
 ('necessary', 1),
 ('conduit', 1),
 ('which', 1),
 ('lead', 1),
 ('situate', 1),
 ('enumer', 1),
 ('raid', 1),
 ('level', 1),
 ('banker', 1),
 ('’', 1),
 ('s', 1),
 ('factor', 1),
 ('determine', 1),
 ('whether', 1),
 ('detection-algorithm', 1),
 ('must', 1),
 ('be', 1),
 ('avoid', 1),
 ('between', 1),
 ('logic', 1),
 ('physic', 1),
 ('address', 1),
 ('dynam', 1),
 ('load', 1),
 ('aid', 1),
 ('overlay', 1),
 ('basic', 1),
 ('fragment', 1),
 ('swap', 1),
 ('result', 1),
 ('manag', 1),
 ('example', 1),
 ('process', 1),
 ('socket', 1),
 ('direct', 1),
 ('method', 1),
 ('thrash', 1),
 ('occur', 1),
 ('best', 1),
 ('size', 1),
 ('structure', 1),
 ('for', 1),
 ('attribute', 1),
 ('consid', 1),
 ('root', 1),
 ('devic', 1),
 ('driver', 1),
 ('primary', 1),
 ('vf', 1),
 ('cpu', 1),
 ('regist', 1),
 ('topic', 1),
 ('i/o', 1),
 ('statu', 1),
 ('inform', 1),
 ('pro', 1),
 ('con', 1),
 ('line', 1),
 ('interfac', 1),
 ('cach', 1),
 ('spool', 1),
 ('assembl', 1),
 ('interrupt', 1),
 ('gui', 1),
 ('preemptive', 1),
 ('format', 1),
 ('prerequisite', 1),
 ('instal', 1),
 ('plumbing/pip', 1),
 ('no', 1),
 ('different', 1),
 ('intern', 1),
 ('extern', 1),
 ('will', 1),
 ('want', 1),
 ('list', 1),
 ('down', 1),
 ('directori', 1),
 ('at', 1),
 ('same', 1),
 ('time', 1),
 ('paus', 1),
 ('after', 1),
 ('everi', 1),
 ('screen', 1),
 ('output', 1),
 ('would', 1),
 ('name', 1),
 ('examplefile.txt', 1),
 ('appear', 1),
 ('view', 1),
 ('consol', 1),
 ('window', 1),
 ('98', 1),
 ('folder', 1),
 ('safe', 1),
 ('not', 1),
 ('affect', 1),
 ('by', 1),
 ('virus', 1),
 ('add', 1),
 ('new', 1),
 ('entri', 1),
 ('launcher', 1),
 ('libaio', 1),
 ('package', 1),
 ('behavior', 1),
 ('tab', 1),
 ('mean', 1),
 ('“', 1),
 ('export', 1),
 ('”', 1),
 ('reset', 1),
 ('configure', 1),
 ('termin', 1)]
In [152]:
stop_words = ['?', 'explain', 'what', 'is', 'the', 'a', 'of',
              'in', 'are', 'an', '.', 'it', 'for', 'thi', 'with',
              'be', '–', 'how', 'to', 'when', 'and', 'you', 'can',
              ',', 'from', 'better', 'do', 'that', 'by',  'on', 'as',
              'or', ')', '('
             ]
In [153]:
for i in stop_words:
    counts[i]=0

sorted_counts = sorted(counts.items(), key=lambda x: x[1], reverse=True)
In [154]:
sorted_counts
Out[154]:
[('system', 11),
 ('oper', 6),
 ('command', 6),
 ('ubuntu', 6),
 ('purpose', 3),
 ('page', 3),
 ('memori', 3),
 ('differ', 3),
 ('doe', 3),
 ('design', 3),
 ('file', 3),
 ('main', 2),
 ('describe', 2),
 ('give', 2),
 ('algorithm', 2),
 ('deadlock', 2),
 ('util', 2),
 ('state', 2),
 ('space', 2),
 ('function', 2),
 ('access', 2),
 ('partit', 2),
 ('type', 2),
 ('multitask', 2),
 ('whi', 2),
 ('under', 2),
 ('uniti', 2),
 ('use', 2),
 ('demand', 1),
 ('advantage', 1),
 ('multiprocessor', 1),
 ('kernel', 1),
 ('real-time', 1),
 ('virtual', 1),
 ('object', 1),
 ('multiprogram', 1),
 ('time-', 1),
 ('share', 1),
 ('server', 1),
 ('classify', 1),
 ('asymmetry', 1),
 ('cluster', 1),
 ('thread', 1),
 ('some', 1),
 ('benefit', 1),
 ('multithread', 1),
 ('program', 1),
 ('briefly', 1),
 ('fcf', 1),
 ('rr', 1),
 ('schedule', 1),
 ('necessary', 1),
 ('conduit', 1),
 ('which', 1),
 ('lead', 1),
 ('situate', 1),
 ('enumer', 1),
 ('raid', 1),
 ('level', 1),
 ('banker', 1),
 ('’', 1),
 ('s', 1),
 ('factor', 1),
 ('determine', 1),
 ('whether', 1),
 ('detection-algorithm', 1),
 ('must', 1),
 ('avoid', 1),
 ('between', 1),
 ('logic', 1),
 ('physic', 1),
 ('address', 1),
 ('dynam', 1),
 ('load', 1),
 ('aid', 1),
 ('overlay', 1),
 ('basic', 1),
 ('fragment', 1),
 ('swap', 1),
 ('result', 1),
 ('manag', 1),
 ('example', 1),
 ('process', 1),
 ('socket', 1),
 ('direct', 1),
 ('method', 1),
 ('thrash', 1),
 ('occur', 1),
 ('best', 1),
 ('size', 1),
 ('structure', 1),
 ('attribute', 1),
 ('consid', 1),
 ('root', 1),
 ('devic', 1),
 ('driver', 1),
 ('primary', 1),
 ('vf', 1),
 ('cpu', 1),
 ('regist', 1),
 ('topic', 1),
 ('i/o', 1),
 ('statu', 1),
 ('inform', 1),
 ('pro', 1),
 ('con', 1),
 ('line', 1),
 ('interfac', 1),
 ('cach', 1),
 ('spool', 1),
 ('assembl', 1),
 ('interrupt', 1),
 ('gui', 1),
 ('preemptive', 1),
 ('format', 1),
 ('prerequisite', 1),
 ('instal', 1),
 ('plumbing/pip', 1),
 ('no', 1),
 ('different', 1),
 ('intern', 1),
 ('extern', 1),
 ('will', 1),
 ('want', 1),
 ('list', 1),
 ('down', 1),
 ('directori', 1),
 ('at', 1),
 ('same', 1),
 ('time', 1),
 ('paus', 1),
 ('after', 1),
 ('everi', 1),
 ('screen', 1),
 ('output', 1),
 ('would', 1),
 ('name', 1),
 ('examplefile.txt', 1),
 ('appear', 1),
 ('view', 1),
 ('consol', 1),
 ('window', 1),
 ('98', 1),
 ('folder', 1),
 ('safe', 1),
 ('not', 1),
 ('affect', 1),
 ('virus', 1),
 ('add', 1),
 ('new', 1),
 ('entri', 1),
 ('launcher', 1),
 ('libaio', 1),
 ('package', 1),
 ('behavior', 1),
 ('tab', 1),
 ('mean', 1),
 ('“', 1),
 ('export', 1),
 ('”', 1),
 ('reset', 1),
 ('configure', 1),
 ('termin', 1),
 ('explain', 0),
 ('the', 0),
 ('of', 0),
 ('an', 0),
 ('?', 0),
 ('what', 0),
 ('is', 0),
 ('are', 0),
 ('a', 0),
 ('.', 0),
 ('how', 0),
 ('can', 0),
 ('to', 0),
 ('in', 0),
 ('be', 0),
 ('from', 0),
 ('better', 0),
 ('when', 0),
 ('for', 0),
 (',', 0),
 ('and', 0),
 ('do', 0),
 ('you', 0),
 ('by', 0),
 ('it', 0),
 ('thi', 0),
 ('with', 0),
 ('–', 0),
 ('that', 0),
 ('on', 0),
 ('as', 0),
 ('or', 0),
 (')', 0),
 ('(', 0)]

Discussion Material

In [155]:
counts_disc = {}

for doc in topics_list:
    print(doc["discussion_material"])
    words = word_tokenize(doc["discussion_material"])
    for word in words:
        word = ps.stem(word.lower())
        word = spell.correction(word) if not len(spell.candidates(word))>1 else word
        if word in counts_disc:
            counts_disc[word] += 1
        else:
            counts_disc[word] = 1
In [156]:
for i in stop_words:
    counts_disc[i]=0

sorted_counts_disc = sorted(counts_disc.items(), key=lambda x: x[1], reverse=True)
In [157]:
sorted_counts_disc
Out[157]:
[('system', 36),
 ('process', 23),
 ('file', 18),
 ('oper', 17),
 ('memori', 14),
 ('command', 14),
 ('time', 13),
 ('page', 11),
 ('use', 11),
 ('user', 11),
 ('one', 10),
 ('cpu', 9),
 ('program', 8),
 ('applix', 8),
 ('have', 8),
 ('run', 8),
 ('interfac', 8),
 ('alloc', 8),
 ('allow', 8),
 ('i/o', 8),
 ('provid', 7),
 ('at', 7),
 ('access', 7),
 ('other', 7),
 ('launcher', 7),
 ('main', 6),
 ('not', 6),
 ('there', 6),
 ('copi', 6),
 ('such', 6),
 ('wait', 6),
 ('mean', 6),
 ('devic', 6),
 ('compute', 5),
 ('manag', 5),
 ('activ', 5),
 ('execute', 5),
 ('all', 5),
 ('processor', 5),
 ('also', 5),
 ('type', 5),
 ('need', 5),
 ('server', 5),
 ('take', 5),
 ('unit', 5),
 ('algorithm', 5),
 (':', 5),
 ('no', 5),
 ('address', 5),
 ('size', 5),
 ('dir', 5),
 ('design', 4),
 ('perform', 4),
 ('well', 4),
 ('refer', 4),
 ('into', 4),
 ('increase', 4),
 ('more', 4),
 ('commun', 4),
 ('util', 4),
 ('interact', 4),
 ('each', 4),
 ('send', 4),
 ('request', 4),
 ('creat', 4),
 ('thread', 4),
 ('gener', 4),
 ('queue', 4),
 ('occur', 4),
 ('name', 4),
 ('under', 4),
 ('mani', 4),
 ('will', 4),
 ('code', 4),
 ('ani', 4),
 ('termin', 4),
 ('differ', 4),
 ('separ', 4),
 ('which', 4),
 ('include', 4),
 ('interrupt', 4),
 ('complet', 4),
 ('drive', 4),
 ('anoth', 3),
 ('environs', 3),
 ('’', 3),
 ('s', 3),
 ('then', 3),
 ('disk', 3),
 ('becaus', 3),
 ('share', 3),
 ('resource', 3),
 ('kernel', 3),
 ('data', 3),
 ('between', 3),
 ('software', 3),
 ('hardware', 3),
 ('defin', 3),
 ('especi', 3),
 ('fit', 3),
 ('physic', 3),
 ('switch', 3),
 ('known', 3),
 ('multitask', 3),
 ('so', 3),
 ('short', 3),
 ('first', 3),
 ('case', 3),
 ('avail', 3),
 ('action', 3),
 ('machin', 3),
 ('state', 3),
 ('set', 3),
 ('schedule', 3),
 ('implement', 3),
 ('way', 3),
 ('deadlock', 3),
 (';', 3),
 ('parityraid', 3),
 ('amount', 3),
 ('order', 3),
 ('instruct', 3),
 ('problem', 3),
 ('back', 3),
 ('store', 3),
 ('if', 3),
 ('base', 3),
 ('inform', 3),
 ('instead', 3),
 ('come', 3),
 ('structure', 3),
 ('support', 3),
 ('network', 3),
 ('open', 3),
 ('line', 3),
 ('printer', 3),
 ('assembl', 3),
 ('language', 3),
 ('graphic', 3),
 ('icon', 3),
 ('without', 3),
 ('folder', 3),
 ('ubuntu', 3),
 ('linx', 3),
 ('uniti', 3),
 ('option', 3),
 ('export', 3),
 ('exist', 2),
 ('two', 2),
 ('purpose', 2),
 ('make', 2),
 ('ram', 2),
 ('require', 2),
 ('number', 2),
 ('consider', 2),
 ('they', 2),
 ('overall', 2),
 ('reliabl', 2),
 ('connect', 2),
 ('ha', 2),
 ('fix', 2),
 ('time-share', 2),
 ('multipl', 2),
 ('job', 2),
 ('them', 2),
 ('happen', 2),
 ('fast', 2),
 ('smp', 2),
 ('form', 2),
 ('these', 2),
 ('client', 2),
 ('hot', 2),
 ('where', 2),
 ('doe', 2),
 ('basic', 2),
 ('regist', 2),
 ('stack', 2),
 ('within', 2),
 ('scheme', 2),
 ('circular', 2),
 ('around', 2),
 ('interv', 2),
 ('up', 2),
 ('conduit', 2),
 ('block-interleav', 2),
 ('bank', 2),
 ('wherein', 2),
 ('like', 2),
 ('load', 2),
 ('routin', 2),
 ('call', 2),
 ('method', 2),
 ('larg', 2),
 ('enable', 2),
 ('onli', 2),
 ('given', 2),
 ('space', 2),
 ('vari', 2),
 ('intern', 2),
 ('we', 2),
 ('deal', 2),
 ('extern', 2),
 ('dure', 2),
 ('new', 2),
 ('socket', 2),
 ('direct', 2),
 ('block', 2),
 ('written', 2),
 ('advantage', 2),
 ('instanc', 2),
 ('high', 2),
 ('best', 2),
 ('singl', 2),
 ('effici', 2),
 ('locat', 2),
 ('partit', 2),
 ('contain', 2),
 ('vf', 2),
 ('particular', 2),
 ('show', 2),
 ('same', 2),
 ('“', 2),
 ('behind', 2),
 ('”', 2),
 ('result', 2),
 ('find', 2),
 ('peopl', 2),
 ('cach', 2),
 ('limit', 2),
 ('spool', 2),
 ('assoc', 2),
 ('print', 2),
 ('want', 2),
 ('output', 2),
 ('translat', 2),
 ('part', 2),
 ('gui', 2),
 ('over', 2),
 ('sent', 2),
 ('/w', 2),
 ('filename', 2),
 ('appear', 2),
 ('e-mail', 2),
 ('go', 2),
 ('through', 2),
 ('secur', 2),
 ('o.', 2),
 ('shell', 2),
 ('start', 2),
 ('a/o', 2),
 ('desktop', 2),
 ('bash', 2),
 ('variabl', 2),
 ('hit', 2),
 ('-', 2),
 ('>', 2),
 ('sure', 1),
 ('develop', 1),
 ('demand', 1),
 ('os', 1),
 ('bring', 1),
 ('miss', 1),
 ('throughput', 1),
 ('save', 1),
 ('money', 1),
 ('final', 1),
 ('core', 1),
 ('everi', 1),
 ('actual', 1),
 ('compon', 1),
 ('ensur', 1),
 ('usable', 1),
 ('real-time', 1),
 ('rigid', 1),
 ('been', 1),
 ('place', 1),
 ('constraint', 1),
 ('primi', 1),
 ('player', 1),
 ('placeholdervirtu', 1),
 ('technique', 1),
 ('let', 1),
 ('outsid', 1),
 ('veri', 1),
 ('object', 1),
 ('multiprogram', 1),
 ('said', 1),
 ('maxim', 1),
 ('among', 1),
 ('while', 1),
 ('running.9', 1),
 ('symmetry', 1),
 ('multi-processor', 1),
 ('most', 1),
 ('common', 1),
 ('multiple-processor', 1),
 ('ident', 1),
 ('classify', 1),
 ('either', 1),
 ('computer-serv', 1),
 ('made', 1),
 ('second', 1),
 ('provis', 1),
 ('update', 1),
 ('asymmetry', 1),
 ('cluster', 1),
 ('standbi', 1),
 ('mode', 1),
 ('noth', 1),
 ('but', 1),
 ('monitor', 1),
 ('role', 1),
 ('should', 1),
 ('fail', 1),
 ('compos', 1),
 ('id', 1),
 ('counter', 1),
 ('respons', 1),
 ('user–', 1),
 ('economy–', 1),
 ('multiprocessor', 1),
 ('architecture', 1),
 ('fcf', 1),
 ('stand', 1),
 ('first-com', 1),
 ('first-serv', 1),
 ('fifo', 1),
 ('rr', 1),
 ('round-robin', 1),
 ('primarily', 1),
 ('aim', 1),
 ('setup', 1),
 ('goe', 1),
 ('10', 1),
 ('100', 1),
 ('millisecond', 1),
 ('situate', 1),
 ('four', 1),
 ('simultan', 1),
 ('mutual', 1),
 ('exclus', 1),
 ('hold', 1),
 ('pre-emption', 1),
 ('raid', 1),
 ('0', 1),
 ('non-redund', 1),
 ('stripingraid', 1),
 ('1', 1),
 ('mirror', 1),
 ('disksraid', 1),
 ('2', 1),
 ('memory-styl', 1),
 ('error-correct', 1),
 ('codesraid', 1),
 ('3', 1),
 ('bit-interleav', 1),
 ('4', 1),
 ('5', 1),
 ('distribute', 1),
 ('6', 1),
 ('p+q', 1),
 ('redund', 1),
 ('bankers_algorithmbank', 1),
 ('algorithmbank', 1),
 ('deadlock-avoid', 1),
 ('get', 1),
 ('never', 1),
 ('cash', 1),
 ('longer', 1),
 ('satisfy', 1),
 ('custom', 1),
 ('depend', 1),
 ('often', 1),
 ('affect', 1),
 ('appli', 1),
 ('logic', 1),
 ('hand', 1),
 ('seen', 1),
 ('dynam', 1),
 ('until', 1),
 ('handl', 1),
 ('infrequ', 1),
 ('error', 1),
 ('overlay', 1),
 ('larger', 1),
 ('than', 1),
 ('idea', 1),
 ('kept', 1),
 ('permit', 1),
 ('noncontigu', 1),
 ('avoid', 1),
 ('chunk', 1),
 ('onto', 1),
 ('fragment', 1),
 ('wast', 1),
 ('fixed-s', 1),
 ('variables', 1),
 ('regular', 1),
 ('later', 1),
 ('swap', 1),
 ('created', 1),
 ('executed', 1),
 ('certain', 1),
 ('event', 1),
 ('occur–', 1),
 ('readi', 1),
 ('processor–', 1),
 ('stop', 1),
 ('abruptly', 1),
 ('endpoint', 1),
 ('model', 1),
 ('view', 1),
 ('sequenc', 1),
 ('record', 1),
 ('arbitrary', 1),
 ('read', 1),
 ('thrash', 1),
 ('spend', 1),
 ('factor', 1),
 ('consid', 1),
 ('suitabl', 1),
 ('tabl', 1),
 ('effect', 1),
 ('topic', 1),
 ('attribute', 1),
 ('identify', 1),
 ('level', 1),
 ('protect', 1),
 ('root', 1),
 ('potenti', 1),
 ('import', 1),
 ('mount', 1),
 ('boot', 1),
 ('driver', 1),
 ('standard', 1),
 ('repress', 1),
 ('mayb', 1),
 ('manufacture', 1),
 ('compani', 1),
 ('prevent', 1),
 ('conflict', 1),
 ('whenev', 1),
 ('incorpor', 1),
 ('virtual', 1),
 ('their', 1),
 ('clean', 1),
 ('file-represent', 1),
 ('vnode', 1),
 ('numer', 1),
 ('accumulators', 1),
 ('index', 1),
 ('registers', 1),
 ('pointer–', 1),
 ('statu', 1),
 ('about', 1),
 ('sever', 1),
 ('howev', 1),
 ('although', 1),
 ('some', 1),
 ('scene', 1),
 ('immedi', 1),
 ('season', 1),
 ('accustom', 1),
 ('quicker', 1),
 ('simpler.howev', 1),
 ('familiar', 1),
 ('parapet', 1),
 ('downside', 1),
 ('who', 1),
 ('fond', 1),
 ('memor', 1),
 ('region', 1),
 ('usual', 1),
 ('much', 1),
 ('speed', 1),
 ('normal', 1),
 ('accordingly', 1),
 ('act', 1),
 ('low-level', 1),
 ('mnemon', 1),
 ('meehan', 1),
 ('notif', 1),
 ('gain', 1),
 ('handler', 1),
 ('receive', 1),
 ('signal', 1),
 ('tell', 1),
 ('symbol', 1),
 ('easier', 1),
 ('mous', 1),
 ('remem', 1),
 ('click', 1),
 ('button', 1),
 ('preemptive', 1),
 ('turn', 1),
 ('necessarily', 1),
 ('control', 1),
 ('crash', 1),
 ('format', 1),
 ('preparatory', 1),
 ('instal', 1),
 ('properly', 1),
 ('determine', 1),
 ('appropri', 1),
 ('input', 1),
 ('example', 1),
 ('list', 1),
 ('screen', 1),
 ('pipe', 1),
 ('produc', 1),
 ('hard', 1),
 ('special', 1),
 ('file/fold', 1),
 ('built-in', 1),
 ('already', 1),
 ('directori', 1),
 ('/wb', 1),
 ('/pc', 1),
 ('/sd', 1),
 ('answer', 1),
 ('d', 1),
 ('/p', 1),
 ('would', 1),
 ('exampl~1.txt', 1),
 ('reason', 1),
 ('8', 1),
 ('charact', 1),
 ('work', 1),
 ('concept', 1),
 ('everyth', 1),
 ('your', 1),
 ('malice', 1),
 ('content', 1),
 ('before', 1),
 ('checksubuntu', 1),
 ('super', 1),
 ('systemunlik', 1),
 ('countless', 1),
 ('see', 1),
 ('anymalwar', 1),
 ('virus', 1),
 ('weak', 1),
 ('window', 1),
 ('default', 1),
 ('left', 1),
 ('side', 1),
 ('introduce', 1),
 ('dash', 1),
 ('programs.in', 1),
 ('add', 1),
 ('entri', 1),
 ('.desktop', 1),
 ('drag', 1),
 ('libaio', 1),
 ('asynchrony', 1),
 ('even', 1),
 ('overlap', 1),
 ('submit', 1),
 ('reap', 1),
 ('group', 1),
 ('behavior', 1),
 ('tab', 1),
 ('chang', 1),
 ('desktopauto-hid', 1),
 ('reveal', 1),
 ('move', 1),
 ('pointer', 1),
 ('spot.en', 1),
 ('workspace', 1),
 ('check', 1),
 ('workspaceadd', 1),
 ('display', 1),
 ('tri', 1),
 ('visibl', 1),
 ('subprocess', 1),
 ('sub-process', 1),
 ('reset', 1),
 ('configure', 1),
 ('simplest', 1),
 ('atl-f2', 1),
 ('#', 1),
 ('–reset', 1),
 ('menu', 1),
 ('accessory', 1),
 ('for', 0),
 ('.', 0),
 ('is', 0),
 ('that', 0),
 ('it', 0),
 ('to', 0),
 ('a', 0),
 ('by', 0),
 ('an', 0),
 ('the', 0),
 ('and', 0),
 ('of', 0),
 ('when', 0),
 ('are', 0),
 ('in', 0),
 (',', 0),
 ('(', 0),
 (')', 0),
 ('from', 0),
 ('with', 0),
 ('can', 0),
 ('as', 0),
 ('on', 0),
 ('thi', 0),
 ('be', 0),
 ('what', 0),
 ('?', 0),
 ('or', 0),
 ('–', 0),
 ('how', 0),
 ('do', 0),
 ('you', 0),
 ('explain', 0),
 ('better', 0)]
In [ ]:
 

WORD CLOUD

In [111]:
# Start with loading all necessary libraries
import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

import matplotlib.pyplot as plt
In [112]:
?WordCloud
In [158]:
create_para = []
for i in sorted_counts:
    create_para.extend( [i[0]] * i[1])
#     create_para = create_para + (" " + i[0]) * i[1]
create_para
Out[158]:
['system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'ubuntu',
 'ubuntu',
 'ubuntu',
 'ubuntu',
 'ubuntu',
 'ubuntu',
 'purpose',
 'purpose',
 'purpose',
 'page',
 'page',
 'page',
 'memori',
 'memori',
 'memori',
 'differ',
 'differ',
 'differ',
 'doe',
 'doe',
 'doe',
 'design',
 'design',
 'design',
 'file',
 'file',
 'file',
 'main',
 'main',
 'describe',
 'describe',
 'give',
 'give',
 'algorithm',
 'algorithm',
 'deadlock',
 'deadlock',
 'util',
 'util',
 'state',
 'state',
 'space',
 'space',
 'function',
 'function',
 'access',
 'access',
 'partit',
 'partit',
 'type',
 'type',
 'multitask',
 'multitask',
 'whi',
 'whi',
 'under',
 'under',
 'uniti',
 'uniti',
 'use',
 'use',
 'demand',
 'advantage',
 'multiprocessor',
 'kernel',
 'real-time',
 'virtual',
 'object',
 'multiprogram',
 'time-',
 'share',
 'server',
 'classify',
 'asymmetry',
 'cluster',
 'thread',
 'some',
 'benefit',
 'multithread',
 'program',
 'briefly',
 'fcf',
 'rr',
 'schedule',
 'necessary',
 'conduit',
 'which',
 'lead',
 'situate',
 'enumer',
 'raid',
 'level',
 'banker',
 '’',
 's',
 'factor',
 'determine',
 'whether',
 'detection-algorithm',
 'must',
 'avoid',
 'between',
 'logic',
 'physic',
 'address',
 'dynam',
 'load',
 'aid',
 'overlay',
 'basic',
 'fragment',
 'swap',
 'result',
 'manag',
 'example',
 'process',
 'socket',
 'direct',
 'method',
 'thrash',
 'occur',
 'best',
 'size',
 'structure',
 'attribute',
 'consid',
 'root',
 'devic',
 'driver',
 'primary',
 'vf',
 'cpu',
 'regist',
 'topic',
 'i/o',
 'statu',
 'inform',
 'pro',
 'con',
 'line',
 'interfac',
 'cach',
 'spool',
 'assembl',
 'interrupt',
 'gui',
 'preemptive',
 'format',
 'prerequisite',
 'instal',
 'plumbing/pip',
 'no',
 'different',
 'intern',
 'extern',
 'will',
 'want',
 'list',
 'down',
 'directori',
 'at',
 'same',
 'time',
 'paus',
 'after',
 'everi',
 'screen',
 'output',
 'would',
 'name',
 'examplefile.txt',
 'appear',
 'view',
 'consol',
 'window',
 '98',
 'folder',
 'safe',
 'not',
 'affect',
 'virus',
 'add',
 'new',
 'entri',
 'launcher',
 'libaio',
 'package',
 'behavior',
 'tab',
 'mean',
 '“',
 'export',
 '”',
 'reset',
 'configure',
 'termin']
In [167]:
# # Start with one review:
# text = df.description[0]

# Create and generate a word cloud image:
wordcloud = WordCloud(collocations=False).generate(" ".join(create_para))

# Display the generated image:
plt.figure(figsize=(100,50))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.savefig("worldcloud_topics.png")
plt.show()
In [163]:
create_para_disc = []
for i in sorted_counts_disc:
    create_para_disc.extend( [i[0]] * i[1])

create_para_disc
Out[163]:
['system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'system',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'process',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'file',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'oper',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'memori',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'command',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'time',
 'page',
 'page',
 'page',
 'page',
 'page',
 'page',
 'page',
 'page',
 'page',
 'page',
 'page',
 'use',
 'use',
 'use',
 'use',
 'use',
 'use',
 'use',
 'use',
 'use',
 'use',
 'use',
 'user',
 'user',
 'user',
 'user',
 'user',
 'user',
 'user',
 'user',
 'user',
 'user',
 'user',
 'one',
 'one',
 'one',
 'one',
 'one',
 'one',
 'one',
 'one',
 'one',
 'one',
 'cpu',
 'cpu',
 'cpu',
 'cpu',
 'cpu',
 'cpu',
 'cpu',
 'cpu',
 'cpu',
 'program',
 'program',
 'program',
 'program',
 'program',
 'program',
 'program',
 'program',
 'applix',
 'applix',
 'applix',
 'applix',
 'applix',
 'applix',
 'applix',
 'applix',
 'have',
 'have',
 'have',
 'have',
 'have',
 'have',
 'have',
 'have',
 'run',
 'run',
 'run',
 'run',
 'run',
 'run',
 'run',
 'run',
 'interfac',
 'interfac',
 'interfac',
 'interfac',
 'interfac',
 'interfac',
 'interfac',
 'interfac',
 'alloc',
 'alloc',
 'alloc',
 'alloc',
 'alloc',
 'alloc',
 'alloc',
 'alloc',
 'allow',
 'allow',
 'allow',
 'allow',
 'allow',
 'allow',
 'allow',
 'allow',
 'i/o',
 'i/o',
 'i/o',
 'i/o',
 'i/o',
 'i/o',
 'i/o',
 'i/o',
 'provid',
 'provid',
 'provid',
 'provid',
 'provid',
 'provid',
 'provid',
 'at',
 'at',
 'at',
 'at',
 'at',
 'at',
 'at',
 'access',
 'access',
 'access',
 'access',
 'access',
 'access',
 'access',
 'other',
 'other',
 'other',
 'other',
 'other',
 'other',
 'other',
 'launcher',
 'launcher',
 'launcher',
 'launcher',
 'launcher',
 'launcher',
 'launcher',
 'main',
 'main',
 'main',
 'main',
 'main',
 'main',
 'not',
 'not',
 'not',
 'not',
 'not',
 'not',
 'there',
 'there',
 'there',
 'there',
 'there',
 'there',
 'copi',
 'copi',
 'copi',
 'copi',
 'copi',
 'copi',
 'such',
 'such',
 'such',
 'such',
 'such',
 'such',
 'wait',
 'wait',
 'wait',
 'wait',
 'wait',
 'wait',
 'mean',
 'mean',
 'mean',
 'mean',
 'mean',
 'mean',
 'devic',
 'devic',
 'devic',
 'devic',
 'devic',
 'devic',
 'compute',
 'compute',
 'compute',
 'compute',
 'compute',
 'manag',
 'manag',
 'manag',
 'manag',
 'manag',
 'activ',
 'activ',
 'activ',
 'activ',
 'activ',
 'execute',
 'execute',
 'execute',
 'execute',
 'execute',
 'all',
 'all',
 'all',
 'all',
 'all',
 'processor',
 'processor',
 'processor',
 'processor',
 'processor',
 'also',
 'also',
 'also',
 'also',
 'also',
 'type',
 'type',
 'type',
 'type',
 'type',
 'need',
 'need',
 'need',
 'need',
 'need',
 'server',
 'server',
 'server',
 'server',
 'server',
 'take',
 'take',
 'take',
 'take',
 'take',
 'unit',
 'unit',
 'unit',
 'unit',
 'unit',
 'algorithm',
 'algorithm',
 'algorithm',
 'algorithm',
 'algorithm',
 ':',
 ':',
 ':',
 ':',
 ':',
 'no',
 'no',
 'no',
 'no',
 'no',
 'address',
 'address',
 'address',
 'address',
 'address',
 'size',
 'size',
 'size',
 'size',
 'size',
 'dir',
 'dir',
 'dir',
 'dir',
 'dir',
 'design',
 'design',
 'design',
 'design',
 'perform',
 'perform',
 'perform',
 'perform',
 'well',
 'well',
 'well',
 'well',
 'refer',
 'refer',
 'refer',
 'refer',
 'into',
 'into',
 'into',
 'into',
 'increase',
 'increase',
 'increase',
 'increase',
 'more',
 'more',
 'more',
 'more',
 'commun',
 'commun',
 'commun',
 'commun',
 'util',
 'util',
 'util',
 'util',
 'interact',
 'interact',
 'interact',
 'interact',
 'each',
 'each',
 'each',
 'each',
 'send',
 'send',
 'send',
 'send',
 'request',
 'request',
 'request',
 'request',
 'creat',
 'creat',
 'creat',
 'creat',
 'thread',
 'thread',
 'thread',
 'thread',
 'gener',
 'gener',
 'gener',
 'gener',
 'queue',
 'queue',
 'queue',
 'queue',
 'occur',
 'occur',
 'occur',
 'occur',
 'name',
 'name',
 'name',
 'name',
 'under',
 'under',
 'under',
 'under',
 'mani',
 'mani',
 'mani',
 'mani',
 'will',
 'will',
 'will',
 'will',
 'code',
 'code',
 'code',
 'code',
 'ani',
 'ani',
 'ani',
 'ani',
 'termin',
 'termin',
 'termin',
 'termin',
 'differ',
 'differ',
 'differ',
 'differ',
 'separ',
 'separ',
 'separ',
 'separ',
 'which',
 'which',
 'which',
 'which',
 'include',
 'include',
 'include',
 'include',
 'interrupt',
 'interrupt',
 'interrupt',
 'interrupt',
 'complet',
 'complet',
 'complet',
 'complet',
 'drive',
 'drive',
 'drive',
 'drive',
 'anoth',
 'anoth',
 'anoth',
 'environs',
 'environs',
 'environs',
 '’',
 '’',
 '’',
 's',
 's',
 's',
 'then',
 'then',
 'then',
 'disk',
 'disk',
 'disk',
 'becaus',
 'becaus',
 'becaus',
 'share',
 'share',
 'share',
 'resource',
 'resource',
 'resource',
 'kernel',
 'kernel',
 'kernel',
 'data',
 'data',
 'data',
 'between',
 'between',
 'between',
 'software',
 'software',
 'software',
 'hardware',
 'hardware',
 'hardware',
 'defin',
 'defin',
 'defin',
 'especi',
 'especi',
 'especi',
 'fit',
 'fit',
 'fit',
 'physic',
 'physic',
 'physic',
 'switch',
 'switch',
 'switch',
 'known',
 'known',
 'known',
 'multitask',
 'multitask',
 'multitask',
 'so',
 'so',
 'so',
 'short',
 'short',
 'short',
 'first',
 'first',
 'first',
 'case',
 'case',
 'case',
 'avail',
 'avail',
 'avail',
 'action',
 'action',
 'action',
 'machin',
 'machin',
 'machin',
 'state',
 'state',
 'state',
 'set',
 'set',
 'set',
 'schedule',
 'schedule',
 'schedule',
 'implement',
 'implement',
 'implement',
 'way',
 'way',
 'way',
 'deadlock',
 'deadlock',
 'deadlock',
 ';',
 ';',
 ';',
 'parityraid',
 'parityraid',
 'parityraid',
 'amount',
 'amount',
 'amount',
 'order',
 'order',
 'order',
 'instruct',
 'instruct',
 'instruct',
 'problem',
 'problem',
 'problem',
 'back',
 'back',
 'back',
 'store',
 'store',
 'store',
 'if',
 'if',
 'if',
 'base',
 'base',
 'base',
 'inform',
 'inform',
 'inform',
 'instead',
 'instead',
 'instead',
 'come',
 'come',
 'come',
 'structure',
 'structure',
 'structure',
 'support',
 'support',
 'support',
 'network',
 'network',
 'network',
 'open',
 'open',
 'open',
 'line',
 'line',
 'line',
 'printer',
 'printer',
 'printer',
 'assembl',
 'assembl',
 'assembl',
 'language',
 'language',
 'language',
 'graphic',
 'graphic',
 'graphic',
 'icon',
 'icon',
 'icon',
 'without',
 'without',
 'without',
 'folder',
 'folder',
 'folder',
 'ubuntu',
 'ubuntu',
 'ubuntu',
 'linx',
 'linx',
 'linx',
 'uniti',
 'uniti',
 'uniti',
 'option',
 'option',
 'option',
 'export',
 'export',
 'export',
 'exist',
 'exist',
 'two',
 'two',
 'purpose',
 'purpose',
 'make',
 'make',
 'ram',
 'ram',
 'require',
 'require',
 'number',
 'number',
 'consider',
 'consider',
 'they',
 'they',
 'overall',
 'overall',
 'reliabl',
 'reliabl',
 'connect',
 'connect',
 'ha',
 'ha',
 'fix',
 'fix',
 'time-share',
 'time-share',
 'multipl',
 'multipl',
 'job',
 'job',
 'them',
 'them',
 'happen',
 'happen',
 'fast',
 'fast',
 'smp',
 'smp',
 'form',
 'form',
 'these',
 'these',
 'client',
 'client',
 'hot',
 'hot',
 'where',
 'where',
 'doe',
 'doe',
 'basic',
 'basic',
 'regist',
 'regist',
 'stack',
 'stack',
 'within',
 'within',
 'scheme',
 'scheme',
 'circular',
 'circular',
 'around',
 'around',
 'interv',
 'interv',
 'up',
 'up',
 'conduit',
 'conduit',
 'block-interleav',
 'block-interleav',
 'bank',
 'bank',
 'wherein',
 'wherein',
 'like',
 'like',
 'load',
 'load',
 'routin',
 'routin',
 'call',
 'call',
 'method',
 'method',
 'larg',
 'larg',
 'enable',
 'enable',
 'onli',
 'onli',
 'given',
 'given',
 'space',
 'space',
 'vari',
 'vari',
 'intern',
 'intern',
 'we',
 'we',
 'deal',
 'deal',
 'extern',
 'extern',
 'dure',
 'dure',
 'new',
 'new',
 'socket',
 'socket',
 'direct',
 'direct',
 'block',
 'block',
 'written',
 'written',
 'advantage',
 'advantage',
 'instanc',
 'instanc',
 'high',
 'high',
 'best',
 'best',
 'singl',
 'singl',
 'effici',
 'effici',
 'locat',
 'locat',
 'partit',
 'partit',
 'contain',
 'contain',
 'vf',
 'vf',
 'particular',
 'particular',
 'show',
 'show',
 'same',
 'same',
 '“',
 '“',
 'behind',
 'behind',
 '”',
 '”',
 'result',
 'result',
 'find',
 'find',
 'peopl',
 'peopl',
 'cach',
 'cach',
 'limit',
 'limit',
 'spool',
 'spool',
 'assoc',
 'assoc',
 'print',
 'print',
 'want',
 'want',
 'output',
 'output',
 'translat',
 'translat',
 'part',
 'part',
 'gui',
 'gui',
 'over',
 'over',
 'sent',
 'sent',
 '/w',
 '/w',
 'filename',
 'filename',
 'appear',
 'appear',
 'e-mail',
 'e-mail',
 'go',
 'go',
 'through',
 'through',
 'secur',
 'secur',
 'o.',
 'o.',
 'shell',
 'shell',
 'start',
 'start',
 'a/o',
 'a/o',
 'desktop',
 'desktop',
 'bash',
 'bash',
 'variabl',
 'variabl',
 'hit',
 'hit',
 '-',
 '-',
 '>',
 '>',
 'sure',
 'develop',
 'demand',
 'os',
 'bring',
 'miss',
 'throughput',
 'save',
 'money',
 'final',
 'core',
 'everi',
 'actual',
 'compon',
 'ensur',
 'usable',
 'real-time',
 'rigid',
 'been',
 'place',
 'constraint',
 'primi',
 'player',
 'placeholdervirtu',
 'technique',
 'let',
 'outsid',
 'veri',
 'object',
 'multiprogram',
 'said',
 'maxim',
 'among',
 'while',
 'running.9',
 'symmetry',
 'multi-processor',
 'most',
 ...]
In [166]:
# Create and generate a word cloud image:
wordcloud_disc = WordCloud(collocations=False).generate(" ".join(create_para_disc))

# Display the generated image:
plt.figure(figsize=(100,50))
plt.imshow(wordcloud_disc, interpolation='bilinear')
plt.axis("off")
plt.savefig("worldcloud_materials.png")
plt.show()

Word Enhancement - Spell Checking

Some of the words like "Operating System" is stemmed as "oper" "system". There has to be a novel way to catch these. This is why we try to look at pyspellchecker (Looks like hunspell is not supported anymore)

In [117]:
from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))
happening
{'happening', 'penning', 'henning'}
In [145]:
spell.correction("provid")
spell.candidates("ubuntu")
Out[145]:
{'bantu', 'bunte', 'bunty', 'runtu'}
In [ ]:
# Updated above (whole) code with spellchecker.